Research in Computing Science, Vol. 70, pp. 79-91, 2013.
Abstract: Named entity recognition is an involved task and is one that usually requires the usage of numerous resources. Recognizing Arabic entities is an even more difficult task due to the inherent ambiguity of the Arabic language. Previous approaches that have tackled the problem of Arabic named entity recognition have used Arabic parsers and taggers combined with a huge set of gazetteers and sometimes large training sets. However, the recent surge in the usage of social media, where colloquial Arabic, rather than modern standard Arabic is used, invalidates these approaches because existing parsers fail to parse colloquial Arabic at an acceptable level of precision. To address such limitations, this paper presents an approach for recognizing Arabic persons’ names without utilizing any Arabic parsers or taggers. The approach uses only a limited set of publicly available dictionaries. The followed approach integrates dictionaries with a statistical model based on association rules for extracting patterns that indicate the occurrence of persons’ names. Through experimentation on a benchmark dataset, we show that the performance of the presented technique is comparable to the state of the art machine learning approach.
Keywords: Arabic Named Entity Recognition, Association Rules, Colloquial Arabic, Modern Standard Arabic
PDF: A Novel Approach for Detecting Arabic Persons’ Names using Limited Resources
PDF: A Novel Approach for Detecting Arabic Persons’ Names using Limited Resources